Systems and methods for analyzing documents using machine learning techniques

ABSTRACT

Systems and methods for activity risk management are disclosed. A system for activity risk management may include a memory storing instructions and at least one processor configured to execute instructions to perform operations including: classifying document data by identifying at least one marker in the document data, the at least one marker being associated with a document type; selecting an extraction model based on the document type; extracting model input data from the classified document data using the extraction model; applying a machine learning model to the extracted model input data to score the document data, the machine learning model having been trained with document data of a same document type as the document type associated with the at least one marker; and generating, based on the applying, a favorability output based on an amount of risk associated with the document data.

TECHNICAL FIELD

The present disclosure generally relates to computerized methods andsystems for analyzing documents and, more particularly, to computerizedsystems and method for using computerized modeling to analyze extracteddocument data and predict institutional risks.

BACKGROUND

In current environments, there are many areas where an organization mayseek to have a degree of monitoring over particular activities of otherorganizations, especially when those activities have the potential forinstitutional risk (e.g. damage to an organization, harm to consumers,etc.). In some cases, human monitors attempt to identify institutionalrisk by gleaning information from documents of the organization.However, to identify these risks using current techniques, individualsmust manually review thousands of pages of documents, sometimes failingto identify key risk-impacting information, and often failing toidentify connections or correlations between documents Sometimes, suchmanual review may be so error-prone or slow to the point where aninstitutional risk is not identified or mitigated before becomingrealized by an institution. Moreover, such manual review can make itdifficult to identify trends within an organization that may indicate achange in institutional risk. In many cases, important documents areoften scattered across multiple physical locations, requiring largeramounts of manpower to perform complete review. Even in cases whererudimentary computerized systems are used to aid document review, suchsystems operate inefficiently, such as by not fully understanding aparticular document type or subject matter, which can aid in riskanalysis.

In other environments, an organization may seek to have a degree ofmonitoring over its own activities, to identify institutional risks toits own operations. However, in these instances, organizations oftensuffer from the drawbacks discussed above. Moreover, an organization maybenefit from analysis of documents to identify institutional risks usingdata aggregated from multiple organizations, such as from otherorganizations operating in a similar industry, but this may be hinderedby difficulty sharing documents that include personally identifiableinformation (PH).

In some cases, organizations may receive large amounts of analysisinformation that includes unneeded or ill-formatted information. Whenreceived through a computer network, such unneeded information burdensnetwork bandwidth. Additionally, ill-formatted information may beunusable by an organization, or may unnecessarily burden processingresources to convert into a useable format.

Therefore, a need exists in the institutional risk management industryto provide customizable, correctly tailored, rapid, and accurate riskanalysis information. The present disclosure is directed to addressingthese and other challenges.

SUMMARY

One aspect of the present disclosure is directed to acomputer-implemented system for entity risk management. The systemcomprises a non-transitory computer-readable medium configured to storeinstructions and at least one processor configured to execute theinstructions to perform operations. The operations include establishinga connection between the system and a data source, the data source beingremote from the system and associated with a first entity; receivingfirst institution data from the data source; extracting model input datafrom the institution data using a natural language processing (NLP)classifier; applying a machine learning model to the extracted modelinput data to predict a risk level associated with the first entity, themachine learning model having been trained to predict risk levels usingsecond institution data; generating analysis data based on the predictedrisk level; and based on the analysis data, transmitting an alert to amanagement device communicably connected to the system.

Another aspect of the present disclosure is directed to acomputer-implemented system for activity risk management. The systemcomprises a non-transitory computer-readable medium configured to storeinstructions and at least one processor configured to execute theinstructions to perform operations. The operations include accessingdocument data associated with at least one of a transaction or anindividual; normalizing the document data; classifying the normalizeddocument data; extracting model input data from the classified documentdata; applying a machine learning model to the extracted model inputdata to score the document data, the machine learning model having beentrained to generate a favorability output indicating a favorability ofthe transaction or individual; and generating analysis data based on thescored document data.

Another aspect of the present disclosure is directed to acomputer-implemented system for providing selective access to modeloutput data. The system comprises a non-transitory computer-readablemedium configured to store instructions and at least one processorconfigured to execute the instructions to perform operations. Theoperations include receiving, through an application programminginterface (API) and from a requester device, an API request for data,the API request identifying a requestor entity associated with therequestor device; determining a data, type based on the API request;determining an authorization level of the requestor; accessing firstmodel output data corresponding to the data type and the authorizationlevel, the first model output data having been generated by a machinelearning model trained to predict a risk level based on document data;and transmitting the first model output data to the requestor device.

Other aspects of the present disclosure are directed to methods forperforming the functions of the computer-implemented systems discussedabove.

Other systems, methods, and computer-readable media are also discussedherein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of an example system architecture forpredicting risk, consistent with the disclosed embodiments.

FIG. 2 is a block diagram of an example server for predicting risk,consistent with the disclosed embodiments.

FIG. 3 is a block diagram of an example user device, consistent with thedisclosed embodiments.

FIG. 4 is a flowchart of an example process for predicting institutionalrisk, consistent with the disclosed embodiments.

FIG. 5 is a flowchart of an example process for analyzing document data,consistent with the disclosed embodiments.

FIG. 6 is a flowchart of an example process for coordinating analysisdata delivery access, consistent with the disclosed embodiments.

FIGS. 7A, 7B, 7C, and 7D depict example interfaces presented on userdevice 300, consistent with the disclosed embodiments.

FIG. 8 depicts an example diagram of a borrower state transition model,consistent with the disclosed embodiments.

DETAILED DESCRIPTION

The disclosed embodiments include systems and methods for processingfinancial transactions. Before explaining certain embodiments of thedisclosure in detail, it is to be understood that the disclosure is notlimited in its application to the details of construction and to thearrangements of the components set forth in the following description orillustrated in the drawings. The disclosure is capable of embodiments inaddition to those described and of being practiced and carried out invarious ways. Also, it is to be understood that the phraseology andterminology employed herein, as well as in the accompanying drawings,are for the purpose of description and should not be regarded aslimiting.

As such, those skilled in the art will appreciate that the conceptionupon which this disclosure is based may readily be utilized as a basisfor designing other structures, methods, and systems for carrying outthe several purposes of the present disclosure.

Reference will now be made in detail to the present example embodimentsof the disclosure, examples of which are illustrated in the accompanyingdrawings. Wherever possible, the same reference numbers will be usedthroughout the drawings to refer to the same or like parts.

FIG. 1 is a schematic diagram illustrating an example systemarchitecture 100 for predicting risk, consistent with the disclosedembodiments. For example, system architecture 100 may predict riskrelated to one or more institutions, such as a bank, a lender, a checkclearinghouse, a financial advisement entity, a business (e.g., anautomobile dealership), a hospital, a healthcare provider, or otherorganization. As discussed below, system architecture 100 may analyzedocument data to predict associated risks. Devices within systemarchitecture 100 may include at least one module (examples of which arediscussed below), which may include a program, code, model, workflow,process, thread, routine, coroutine, function, or other processingelement for predicting outcomes based on document data.

In some embodiments, system architecture 100 may include a financialtransaction system 102, which may exist fully or partially within a bankor other institution. While this system has been termed as a financialtransaction system, this term is merely exemplary, as embodiments existwhere financial transaction system 102 may be associated with financialinformation not related to transactions, or may be related toinformation not related to finance. In some embodiments, financialtransaction system 102 may include at least one processing device 104,which may be an instance of server 200 and/or user device 300.Processing device 104 may carry out all or any portion of the processesdescribed herein. In some embodiments, financial transaction system 102may include multiple processing devices 104, which may be communicablycoupled through any kind of suitable wired and/or wireless local areanetwork (LAN). In some embodiments, financial transaction system 102 mayalso utilize cloud computing technologies (e.g., for storage, caching,or the like).

In some embodiments, processing device 104 may include a risk advisormodule 106, which may be stored in memory 230 or memory 330 (discussedfurther below). In some embodiments, risk advisor module 106 may beconfigured to carry out all or part of process 400, described below. Insome embodiments, risk advisor module 106 may provide analysisinformation and/or recommendations, discussed below, to a device withinfinancial transaction system 102. For example, processing device 104 mayprovide analysis results to risk advisor module 106.

In some embodiments, processing device 104 may include a documentadvisor module 108, which may be stored in memory 230 or memory 330(discussed further below). In some embodiments, document advisor module108 may be configured to carry out all or part of process 500, describedbelow. In some embodiments, document advisor module 108 may beconfigured to examine a particular type of document, such as a loanapplication paper. In some embodiments, risk advisor module may provideanalysis information, including recommendations, discussed below, to adevice within financial transaction system 102.

While shown within the same processing device 104 as risk advisor module106, it should be noted that risk advisor module 106 and documentadvisor module 108 may be present on separate processing devices 104.Moreover, a processing device 104 may include multiple risk advisormodules 106, document advisor modules 108, or any other moduleconfigured for implementing part of a process discussed herein. Forexample, a processing device 104 may include multiple document advisormodules 108 associated with examining different types of documents(e.g., loan applications, account applications, withdrawal requests,transfer requests, personnel documents, etc.).

In some embodiments, financial transaction system 102 may becommunicably connected with activity analysis platform 110. For example,financial transaction system 102 may connect with activity analysisplatform 110 through network 120. Network 120 may be a public or privatenetwork, and may include, without limitation, any combination of a LocalArea Network (LAN), a Wide Area Network (WAN), a Metropolitan AreaNetwork, an Institute of Electrical and Electronics Engineers (IEEE)802.11 wireless network (e.g., “Wi-Fi”), wired network, a network ofnetworks (e.g., the Internet), a land-line telephone network, a fiberoptic network, and/or a cellular network. Network 120 may be connectedto other networks (not depicted in FIG. 1 ) to connect the varioussystem components to each other and/or to external systems or devices.In some embodiments, network 120 may be a secure network and require apassword to access the network, or a portion of the network.

In some embodiments, system architecture 100 may include an activityanalysis platform 110, which may be associated with generating analysisbased on document data. In some embodiments, activity analysis platform110 may include at least one processing device 114, which may be aserver 200 and/or user device 300. Processing device 114 may carry outall or any portion of the processes described herein. In someembodiments, activity analysis platform 110 may include multipleprocessing devices 104, which may be communicably coupled through anykind of suitable wired and/or wireless local area network (LAN). In someembodiments, activity analysis platform 110 may also utilize cloudcomputing technologies (e.g., for storage, caching, or the like).

In some embodiments, processing device 114 may include a virtual auditmodule 116, which may be stored in memory 230 or memory 330 (discussedfurther below). In some embodiments, virtual audit module 116 may beconfigured to carry out all or part of process 400, described below. Insome embodiments, risk advisor module may provide analysis informationand/or recommendations, discussed below, to a device within financialtransaction system 102. In some embodiments, virtual audit module 116may aggregate document data from multiple sources (e.g., multiplefinancial transaction systems 102) and may perform risk analysis basedon data from a single source or aggregated from multiple sources. Insome embodiments, virtual audit module 116 may operate periodically orcontinually, to regularly monitor organizations as new documents areexamined. In some embodiments, virtual audit module 116 may determinethat a risk analysis result satisfies an alert threshold and maytransmit an alert to a device in system architecture 100.

In some embodiments, processing device 114 may include an examinationassistant module 118, which may be stored in memory 230 or memory 330(discussed further below). In some embodiments, examination assistantmodule 118 may be configured to carry out all or part of process 400,described below. In some embodiments, examination assistant module 118may provide particularized analysis information and/or recommendations,which may be based on user input. In some embodiments, examinationassistant module 118 may include a machine learning model that learns auser's (e.g., financial examiner's) preferences over time and adjustsanalysis and/or display parameters in response. By way of example, amachine learning model may learn over time that a particular user (e.g.,as identified by particular user credentials used at processing device114) prefers to access particular types of documents when examining dataunderlying risk predictions, and may score the document types accordingto frequency of access, order of access, screen time spent on aparticular document type, etc. Based on these learned preferences, themachine learning model may provide a list of documents to the user,where the documents are ranked according to strength of user preferencescores. Additionally or alternatively, processing device 114 may providecertain analysis results using examination assistant module 118, whichmay be configured to provide charts, maps, lists, filters, or othertools for allowing a user to examine results (e.g., number of new loansover time, total assets over time, a relatively fast rate of change toan entity metric, a close timing between two events, etc.).

System architecture 100 may also include a 3^(rd) party data provider130, which may store data that can be used by a tool (e.g., documentdata analyzer 232), consistent with disclosed embodiments. In someembodiments, 3^(rd) party data provider 130 may store data related to aparticular field, such as demographics or economics. By way of example,3^(rd) party data provider 130 may store statistics from the UnitedStates Department of Labor, such as statistics relating to employment orincome. In some embodiments, a device within system architecture 100 mayperiodically extract up-to-date data from 3^(rd) party data provider130, such that a module may have more accurate datasets, which can beused as input data for a module (e.g., model for predictinginstitutional risk, predicting favorability of a transaction orindividual, etc.). In some embodiments, activity analysis platform 110may be configured to (e.g., have multiple data intake modules for)download data from multiple 3^(rd) party data providers 130 andstandardize the downloaded data into a format usable by a machinelearning model (e.g., for use in process 400). A 3^(rd) party dataprovider 130 may also connect to activity analysis platform 110 throughnetwork 120,

FIG. 2 is a block diagram of an example server 200 used in systemarchitecture 100, consistent with the disclosed embodiments. Forexample, server 200 may be used in financial transaction system 102 oractivity analysis platform 110. Server 200 may be one or more computingdevices configured to execute software instructions stored in memory toperform one or more processes consistent with the disclosed embodiments.For example, server 200 may include one or more memory devices forstoring data and software instructions and one or more hardwareprocessors to analyze the data and execute the software instructions toperform server-based functions and operations (e.g., back-endprocesses). In some embodiments, server 200 may be a virtual processingdevice (e.g., a virtual machine or a container), which may be spun up orspun down to satisfy processing criteria of financial transaction system102, activity analysis platform 110, or other system.

In FIG. 2 , server 200 includes a hardware processor 210, aninput/output (I/O) device 220, and a memory 230. It should be noted thatserver 200 may include any number of those components and may furtherinclude any number of any other components. Server 200 may bestandalone, or it may be part of a subsystem, which may be part of alarger system. For example, server 200 may represent distributed serversthat are remotely located and communicate over a network.

Processor 210 may include or one or more known processing devices, suchas, for example, a microprocessor. In some embodiments, processor 210may include any type of single or mufti-core processor, mobile device mi° controller, central processing unit, etc. In operation, processor 210may execute computer instructions (e.g., program codes) and may performfunctions in accordance with techniques described herein. Computerinstructions may include routines, programs, objects, components, datastructures, procedures, nodules, and functions, which may performparticular processes described herein. In some embodiments, suchinstructions may be stored in memory 230, processor 210, or elsewhere.

I/O device 220 may be one or more devices configured to allow data to bereceived and/or transmitted by server 200. I/O device 220 may includeone or more customer I/O devices and/or components, such as thoseassociated with a keyboard, mouse, touchscreen, display, etc. I/O device220 may also include one or more digital and/or analog communicationdevices that allow server 200 to communicate with other machines anddevices, such as other components of system architecture 100. I/O device220 may also include interface hardware configured to receive inputinformation and/or display or otherwise provide output information. Forexample, I/O device 220 may include a monitor configured to display auser interface.

Memory 230 may include one or more storage devices configured to storeinstructions used by processor 210 to perform functions related todisclosed embodiments. For example, memory 230 may be configured withone or more software instructions associated with programs and/or data.

Memory 230 may include a single program that performs the functions ofthe server 200, or multiple programs. Additionally, processor 210 mayexecute one or more programs located remotely from server 200. Memory230 may also store data that may reflect any type of information in anyformat that the system may use to perform operations consistent withdisclosed embodiments, Memory 230 may be a volatile or non-volatile(e.g., ROM, RAM, PROM, EPROM. EEPROM, flash memory, etc.) magnetic,semiconductor, tape, optical, removable, non-removable, or another typeof storage device or tangible (i.e., non-transitory) computer-readablemedium.

Consistent with the disclosed embodiments, server 200 includes documentdata analyzer 232 configured to receive one or more documents, which insome embodiments may be received from a user device 300. For example, auser device 300 may upload one or more documents to a locationaccessible by server 200, such as by using a web portal or otherinterface. Also consistent with disclosed embodiments, server 200 mayinclude statistic data analyzer 236, which may be configured to generaterisk predictions, which may be based on model input data such as generalledger data. In some embodiments, document data analyzer 232 and/orstatistic data analyzer 236 may be an application configured to operatea computerized model (e.g., a machine learning model). Document dataanalyzer 232 and/or statistic data analyzer 236 may be implemented assoftware (e.g., program codes stored in memory 230), hardware (e.g., aspecialized chip incorporated in or in communication with processor210), or a combination of both. Document data analyzer 232 and/orstatistic data analyzer 236 may include any or all of modules describedherein.

In some embodiments, document data analyzer 232 may include an analysismodel 234, which may be a model having a structure, parameters, and/orany other configuration elements for generating predictive data relatedto documents. In some embodiments, statistic data analyzer 236 mayinclude an analysis model 238, which may be a model having a structure,parameters, and/or any other configuration elements for generatingpredictive data related to institutional risks. Analysis model 234and/or 238 may be, without limitation, any of a computer softwaremodule, an algorithm, a machine learning model, a data model, astatistical model, a natural language processing (NLP) module, k-nearestneighbors (KNN) model, a nearest centroid classifier model, a randomforest model, an extreme gradient boosting model (XGBoost), a textclustering model, a recurrent neural network (RNN) model, a long-shortterm memory (LSTM) model, a convolutional neural network model, oranother neural network model, consistent with disclosed embodiments.Analysis model 234 and/or 238 may be configured to predict performanceof a single entity (e.g., bank) or multiple entities (e.g., multiplebanks).

In some embodiments, a model (e.g., analysis model 234 and/or 238) maybe a model in a learning stage or may have been trained to a degree(e.g., by a developer, a machine, or a combination of both). Forexample, training a model may include providing a model with modeltraining input data, which may be unstructured or semi-structured (e.g.,sourced from one or more documents) or structured (e.g., general ledgerdata, financial accounting metadata, etc., any of which may be from abank). For example, statistic data analyzer 236 may receive input datathat includes both structured and unstructured data, which may provideenhanced predictive performance. As another example, document dataanalyzer 232 may categorize or more documents into high-level documenttypes and may perform document analysis and extraction operations,consistent with disclosed embodiments, and as further detailed withrespect to process 500. A model may use the model training input data togenerate a model output (e.g., a risk level, contributing factors to arisk, a recommendation for reducing a risk, etc.). Model input trainingdata may also not be associated with any specific document, and may bedata from a general ledger of a bank. In some embodiments, a model maybe trained using input data (e.g., document data, general ledgerinformation, etc.) from a single source (e.g., a bank) or multiplesources (e.g., multiple banks). In some embodiments, such as where thetraining is supervised, a user may indicate an amount of accuracy of anoutput to the model (e.g., false positives, false negatives), which maybe part of a recursive feedback loop to the model (e.g., as a subsequentinput). In some embodiments, a developer may interact with a model toapprove or disapprove of suggested changes to a model or parameters of amodel (e.g., suggested by a machine). After such an interaction, themodel may be updated to reflect the user interactions and/or machineinputs. In some embodiments, a model may continue to train until anoutput metric is satisfied (e.g., a threshold number or percentage oforganizational failures are correctly predicted, a threshold number orpercentage of risks or risk elevations are identified, a portion of textis correctly identified, a threshold number or percentage of trainingdocuments are accurately classified, a threshold number or percentage ofloan defaults are correctly predicted, a threshold number or percentageof general ledger accounts are classified or categorized, etc.). In someembodiments, different output metric thresholds may be used fordifferent types of categories, which may enhance predictive performance.A category may be a document category (e.g., a loan application, a newaccount application, etc.) or other data category (e.g., type of generalledger information, such as cash flow statistics). In some embodiments,a model may be a meta-model (e.g., a model of multiple bank-specificmodels). A model ay be configured to generate particular analysis data,described below.

Server 200 may also be communicatively connected to one or moredatabases 240. For example, server 200 may be communicatively connectedto database 240, which may be a database implemented in a computersystem (e.g., a database server computer) in financial transactionsystem 102 and/or activity analysis platform 110. Database 240 mayinclude one or more memory devices that store information and areaccessed and/or managed through server 200. By way of example, database240 may include Oracle™ databases, Sybase™ databases, or otherrelational databases or non-relational databases, such as Hadoopsequence files, HBase, or Cassandra. The databases or other files mayinclude, for example, data and information related to the source anddestination of a network request, the data contained in the request,etc. Systems and methods of disclosed embodiments, however, are notlimited to separate databases. In one aspect, server 200 may includedatabase 240. Alternatively, database 240 may be located remotely fromthe server 200. Database 240 may include computing components (e.g.,database management system, database server, etc.) configured to receiveand process requests for data stored in memory devices of database 240and to provide data from database 240. Server 200 may also include acommunication interface (not shown), which may be implemented in amanner similar to communication interface 350 (described below), and mayallow server 200 to connect to another server 200 or a user device 300.

In an example, document data analyzer 232 may include instructions tocall an API for analyzing document data associated with an organization(e.g., a bank). In some embodiments, the API may communicate withfinancial transaction system 102 to verify document information and/orrequest additional data (e.g., additional documents, confirmation ofdocument information, etc.). In some embodiments, API communications maybe transmitted (e.g., via a mobile device application, a text message, aphone call, or the like) to a user device 300 or another server 200(e.g., a processing device 104) 110 to be presented (e.g., displayed astext or graph, or played as sound) to a user. The API communication mayinclude a request for additional information, and may include one ormore of, for example, a first name, last name, account name, phonenumber, email address, passphrase, document identification number,financial amount, date, type of financial product (e.g., a loan), orfinancial product condition (e.g., an interest rate).

FIG. 3 is a block diagram of an example user device 300 used in systemarchitecture 100, consistent with the disclosed embodiments. As shown inFIG. 3 , user device 300 may include a hardware processor 310, a userapplication 320, a memory 330, a user interface 340, and a communicationinterface 350. In some embodiments, processor 310 may be implemented ina manner similar to processor 210, and memory 330 may be implemented ina manner similar to memory 230.

Processor 310 may include a digital signal processor, a microprocessor,or another appropriate processor to facilitate the execution of computerinstructions encoded in a computer-readable medium. Processor 310 may beconfigured as a separate processor module dedicated to predicting riskbased on extracted document data. Alternatively, processor 310 may beconfigured as a shared processor module for performing other functionsof user device 300 unrelated to the disclosed methods for predictingrisk based on extracted document data. In some embodiments, processor310 may execute computer instructions (e.g., program codes) stored inmemory 330, and may perform functions in accordance with exampletechniques described in this disclosure.

Memory 330 may include any appropriate type of mass storage provided tostore information that processor 310 may need to operate. Memory 330 maybe a volatile or non-volatile, magnetic, semiconductor, tape, optical,removable, non-removable, or another type of storage device or tangible(i.e., non-transitory) computer-readable medium including, but notlimited to, a ROM, a flash memory, a dynamic RAM, and a static RAM.Memory 330 may be configured to store one or more computer programs thatmay be executed by processor 310 to perform the disclosed functions forpredicting risk based on extracted document data.

User application 320 may be a module dedicated to performing functionsrelated to predicting risk based on extracted document data (e.g.,modifying model parameters, validating accuracy of model output,specifying a model objective, etc.). User application 320 may beconfigured as hardware, software, or a combination thereof. For example,user application 320 may be implemented as computer code stored inmemory 330 and executable by processor 310. As another example, userapplication 320 may be implemented as a special-purpose processor, suchas an application-specific integrated circuit (ASIC), dedicated to makean electronic payment. As yet another example, user application 320 maybe implemented as an embedded system or firmware, and/or as part of aspecialized computing device.

User interface 340 may include a graphical interface (e.g., a displaypanel), an audio interface (e.g., a speaker), or a haptic interface(e.g., a vibration motor). For example, the display panel may include aliquid crystal display (LCD), a light-emitting diode (LED), a plasmadisplay, a projection, or any other type of display. The audio interfacemay include a microphone, speaker, and/or audio input/output (e.g.,headphone jack).

User interface 340 may also be configured to receive input or commandsfrom a user. For example, the display panel may be implemented as atouch screen to receive input signals from the user. The touch screenincludes one or more touch sensors to sense touches, swipes, and othergestures on the touch screen. The touch sensors may sense not only aboundary of a touch or swipe action but also a period of time and apressure associated with the touch or swipe action. Alternatively, oradditionally, user interface 340 may include other input devices such askeyboards, buttons, joysticks, and/or trackballs. User interface 340 maybe configured to send the user input to processor 310 and/or userapplication 320 (e.g., an electronic transaction application).

Communication interface 350 can access a network (e.g., network 120)based on one or more communication standards, such as WiFi, LTE, 2G, 3G,4G, 5G, etc. Communication interface 350 may connect user device 300 toanother user device 300 or a server 200, For example, communicationinterface 350 may connect one processing device to another (e.g.,connect processing device 104 to another processing device 104, connectprocessing device 104 to processing device 114, etc.). In someembodiments, communication interface 350 may include a near fieldcommunication (NEC) module to facilitate short-range communicationsbetween user device 300 and other devices. In other embodiments,communication interface 350 may be implemented based on radio-frequencyidentification (RFID) technology, an infrared data association (IrDA)technology, an ultra-wideband (UWB) technology, a Bluetooth® technology,or other technologies.

FIG. 4 is a flowchart of example process 400 for predictinginstitutional risk, consistent with the disclosed embodiments. Process400 may be performed by a computer-implemented system (e.g., server 200)in financial transaction system 102 or activity analysis platform 110,or by an apparatus (e.g., user device 300). The computer-implementedsystem may include a memory (e.g., memory 230 or 330) that storesinstructions and a processor (e.g., processor 210 or 310) programmed toexecute the instructions to implement process 400. Process 400 mayinvolve generating and/or displaying certain user interfaces, such asthose shown in FIGS. 7A-70 (e.g., at step 414). Process 400 may beimplemented as one or more software modules (e.g., an API in statisticdata analyzer 236) stored in memory 230 and executable by processor 210.For ease of description, some steps of process 400 are described asperformed by a particular device, such as processing device 114.However, it should be noted that any step may be executed by any devicewithin system architecture 100, such as processing device 104. Process400 may incorporate aspects of steps from other processes discussedherein. For example, providing the analysis results at step 410 mayinclude aspects of providing analysis results as described with respectto step 512 of process 500.

Referring to process 400 shown in FIG. 4 , at exemplary step 402,processing device 114 may receive institution data, which may be from adata source, and which may have been generated by a user device 300.Prior to receiving the institution data, processing device 114 mayestablish a connection between a system and a data source. In someembodiments, the data source (e.g., financial transaction system 102)may be remote from a system (e.g., activity analysis platform 110) andmay also be associated with a first entity (e.g., a particular bank,lender, financial advisor, other financial institution, business, etc.).In some embodiments, processing device 114 may receive first institutiondata from a first entity (e.g., a bank), second institution data from asecond entity, etc. In some embodiments, the first and second entitiesmay be different financial institutions (e.g., banks), or other type oforganization. In some embodiments, processing device 114 may receiveinstitution document periodically (e.g., once every day, once everymonth, etc.) and/or in response to a request (e.g., a request sent fromprocessing device 114 to processing device 104). In some embodiments,processing device 114 may transmit requests for institution data morefrequently for institutions have a higher amount of predicted risk. Insome embodiments, processing device may require different type ofinstitution data with different amounts of frequency, for example,processing device 114 may receive an institution's accounts receivablesubledger more frequently (e.g., daily) than an institution's fixedassets subledger (e.g., every two days), In this manner, networkeddevices may reduce bandwidth load created by transmission of unnecessaryor repetitive data for a particular process (e.g., process 400).

In some embodiments, the institution data may be associated with aparticular industry, such as financial services. For example,institution data may be associated with (e.g., may include) a generalledger, combination of subledgers (e.g., accounts receivable, accountspayable, fixed assets, etc.), statement of financial position, and/orincome statement, any of which may be generated into structured data byan application at a processing device (e.g., processing device 104). Asother non-limiting examples, institution data may be associated with(e.g., may include) loan history data for one or more loans, a financialasset, a financial liability, a deposit amount, net income during a timeperiod, earnings during a time period, a loan type (e.g., mortgage, carloan, etc.), loan origination date, loan period, an amount of principaloriginated, a payment received, a late charge, a number of days pastdue, a call code, a credit scores, North American IndustryClassification System (NAILS)) data, etc.

Institution data may include semi-structured and/or structured data. Asan example of semi-structured data, institution data may include loandata that identifies loan types, loan amounts, and loan originationdates for a plurality of loans within a set of fields, but isnonconforming to a data structure for which processing device 114 (or asystem) is configured to accept as a valid input (e.g., for input to adata extraction process). In some embodiments, processing device 114 mayconvert semi-structured data into structured data usable for process 400(e.g., implemented by statistic data analyzer 236). As an example ofstructured data, institution data may include a table or other datastructure (e.g., Portable Document Format (PDF) file, Extensible MarkupLanguage (XML) file) with data elements describing financial metrics ofan institution (e.g., a total amount of assets, a total amount ofliabilities, an amount of cashflow of actual payments received, anamount of scheduled cashflow, etc.). Such institutional data may havebeen used generated (e.g., at a user device 300), or machine-generated(e.g., generated automatically in response a system receiving anelectronic payment, issuing a loan, etc.).

Referring again to process 400, at exemplary step 404, processing device114 may extract model input data, which may be extracted frominstitution data. In some embodiments, processing device 114 mayimplement a machine learning model that uses a natural languageprocessing (NLP) classifier to institution data to determine the modelinput data. For example, an NLP classifier may learn particular phrasesor keywords in a specific context indicating, for example, anassociation between institution data (e.g., data received at step 402)and a type of general ledger data (e.g., a value related to accountsreceivable, which may correspond to a field in a model input). In someembodiments, extracting model input data may include using a mappingbetween a data element of institution data and a model input dataelement (e.g., field). For example, the NLP classifier may generate amapping between an institution data element and a model input, and sucha mapping may be used in subsequent data extractions, or otheriterations of a step in process 400 (or other process described herein).In some embodiments, processing device 114 may (e.g., using an NLPclassifier) use text data (e.g., a general ledger account description)to construct and/or update a tree data structure representinginstitution data (e.g., a general ledger). Processing device 114 mayextract a number of different model inputs for generating risk analysisinformation. For example, in contexts related to financial institutions,processing device 114 may extract model inputs from a general ledger ofa bank or other financial institution. Continuing this example,processing device 114 may extract a cash management subledger from ageneral ledger. Model inputs may also include an account value, atransaction value, an asset value (e.g., home value), a current defaultrate, a current delinquency rate, a historical default rate, ahistorical delinquency rate, a payment date, a loan term, a loan type, aloan payment history (e.g., including a principal issuance, a paymentreceived, a late charge, a number of days past due, a call code), anindividual demographic trait (e.g., income amount), an economicstatistic, a credit history, a credit score (e.g., at loan origination),a geographical identifier (e.g., zip code, city, state), ledger data(e.g., an income amount, an expense amount, an asset amount, a liabilityamount, a call report, an institution (e.g., bank) failure list, acapital ratio, a liquidity amount, a deposit amount, an enforcementaction indicator. In some embodiments, extracted model inputs may belabeled and/or used as inputs for training a model.

In some embodiments, processing device 114 may determine that a machinelearning model (e.g., a machine learning model implementing process 400)may have insufficient model input data to provide a model output of athreshold confidence. In these embodiments or others, processing device114 may display a warning or otherwise notify a user (e.g., at userdevice 300). For example, processing device 114 may provide a userinterface allowing a user of processing device 114 (e.g., an instance ofuser device 300) to request additional information (e.g., institutiondata, missing structured data information, unknown model inputs, or dataundetermined due to an extraction error, etc.). For example, processingdevice 114 may provide a button within a user interface that, whenselected by an input device, will prompt another device (e.g., a devicewithin financial transaction system 102) for data, such as bytransmitting an alert to the other device. In some embodiments,processing device 114 may prompt another device to resubmit institutiondata, such as by aggregating up-to-date transaction data from devices infinancial transaction system 102. An example of button for promptingadditional data is shown by the button labeled “Initiate New RecordsRequest” in FIG. 7B. Additionally or alternatively, processing device114 may (e.g., according to a machine learning model) may replace valuesand/or impute missing values using statistical (e.g., time seriesanalysis) and/or machine-learning approaches using context from aninstitution or group of institutions (e.g., time of year, past trends,current trend, a model function, etc.).

Referring again to process 400, at exemplary step 406, processing device114 may receive 3^(rd) party data (e.g., from a 3^(rd) party dataprovider 130). For example, processing device 114 may accesssupplemental data (e.g., non-institution data, data from a source otherthan a particular bank, etc.). For example, the supplemental data may befrom an additional data source, and may relate to demographics (e.g.,life expectancy for a particular geography) or economics (e.g.,employment data, income data). 3^(rd) party data may be an importantsource of additional model inputs, enabling processing device 114 toidentify risks (as discussed below) that may otherwise be unapparent.

Referring again to process 400, at exemplary step 407, processing device114 may input feature engineering, which may involve transforming rawdata into more informative features, which may be used to improve amachine learning process. For example, inputting feature engineering mayinclude any combination of handling of missing values or low qualitydata, such as by leveraging statistical imputation methods, transformingcategorical data values into an appropriate format for statisticaland/or machine learning models to process, scaling numerical values,normalizing data coming from different sources, creating new dynamicfeature sets such as time lags or delta shifts between periods,determining simple moving averages or exponential moving averages,determining volatility or ranges in an input variable to describe timeseries data, and/or another data refinement operation. Featureengineering approaches may include both modifying input data as well ascreated new, derived data based on the given input data.

Referring again to process 400, at exemplary step 408, processing device114 may apply a risk model (e.g., a machine learning model) to theextracted model data. For example, processing device 114 may apply arisk model to the extracted model input data to predict a risk levelassociated with an entity, such as a first entity associated with thefirst institution data received at step 402. In some embodiments, a riskmodel may include a z-score model, which may produce a risk score and/orz score for an entity, such as a bank. In some embodiments, the riskmodel may be a machine learning model that has been trained to predictrisk levels using second institution data, which may have been receivedfrom the first entity and/or a second entity. For example, processingdevice 114 may operate a risk model that is trained and/or re-trainedusing institution data from one or multiple financial institutions, suchas banks. Processing device 114 may operate a risk model whenever newdata is received and/or periodically (e.g., daily, weekly, monthly).

In some embodiments, a risk model may use a combination of model inputsto generate an intermediate output. For example, a risk model mayaggregate individual loan values to determine an impact to a liabilityvalue for an entity (e.g., bank). As another example, a risk model mayapply an algorithm to extracted data to determine information associatedwith a particular bank, such as an amount of liquidity or total loanamounts owed. As yet another example, a risk model may filter modelinputs to result in an intermediate output of data relating to aspecific geographic area, which may have been selected by a user. A riskmodel may also calculate a change in a particular value over a period oftime, such as, for example, a change in an accounts receivable amountover a past month.

The risk model may use a combination of model inputs and/or intermediateoutputs to generate final outputs (e.g., analysis results). In someembodiments, the risk model may identify at least one correlationbetween at least one model input, or at least one change in at least onemodel input, and a failure, or riskiness, of a transaction, an asset, oran entity. For example, the risk model may be a machine learning modelthat is trained to predict a risk level based on a change in activity ofan institution data source entity (e.g., a document source entity).Continuing this example and without limitation, the risk model mayidentify a correlation between a rate of change in loans closed over aperiod of time and a likelihood of an entity failure (e.g., a bankfailure). Of course, categories of model inputs and/or intermediateoutputs may be relatively broad (e.g., liquidity information, earningsinformation, credit risk information) or granular (e.g., residentialreal estate lending information, money market deposit values, cashposition information, etc.) with respect to an institution.

In some embodiments, the risk model may apply statistical weightingand/or outlier approaches such as standard-deviations, Z scores, andother statistical distributions, to factor multiple underlying riskcomponents into composite risk scores. For example, the risk model maypredict a risk score or probability, which may correspond to a risklevel (e.g., range of risk scores, which may be denoted as “high”,“moderate”, “low”, etc.), and which may be included in analysis results.In some embodiments, processing device 114 may describe a risk score orrisk level relative to a defined value (e.g., fixed value, variable,etc.), or may describe a risk score or risk level relative to riskscores or levels for other entities. For example, in some embodiments,processing device 114 may compute z-scores for one or more entities, andcertain ranges of z-scores may correspond to a risk level. For example,a z-score of greater than zero and less than two may be considered lowrisk, a z-score of greater than or equal to two and less than or equalto 3.5 may be considered moderate risk, and a z-score of score greaterthan 3.5 may be considered high risk.

In some embodiments, the risk model may generate analysis data based ona predicted risk level. For example, the analysis data may include thepredicted risk level. In some embodiments, a first model may beconfigured to generate an event-based classification output and a secondmodel may be configured to generate a likelihood (e.g., probability)score (discussed above). For example, the first model may generate anevent-based classification output that predicts an occurrence of anevent (e.g., an expected default on a loan, a delinquency on a loan andsignificant change on a general ledger position, a significant outflowof deposits, a significant shift from less risky to more riskyproducts.) In some embodiments, a processing device 114 may consolidatepredicted risk-events and risk probabilities/ratios into higher-levelrisk scores, such as by utilizing statistical approaches. In someembodiments, a risk score may indicate a likelihood that a transaction,asset, or entity will fail (e.g., 30% chance a loan will be in defaultin the future), and the corresponding risk level may comprise alikelihood of failure (e.g., of a first entity). In some embodiments,processing device 114 may deploy a machine learning model to predict(e.g., using a labeled time-series data set for an institution and/orasset failures) a time in the future when the failure will occur, andmay include this predicted value such that generated analysis datacomprises a predicted amount of time until the failure of the firstentity. Additionally or alternatively, a risk model may predict a changeto at least one model input that may reduce a risk score, and maydesignate such a change as a recommendation with analysis results.Processing device 114 may provide different recommendations depending ona generated model output. For example, processing device 114 maygenerate a recommendation (e.g., for display at a user device 300) thatan entity reduce its level of liabilities, which may be determined frominstitution data (e.g., a machine learning model may understand thatliabilities have increased based on changes in general ledger data), toreduce a predicted risk of failure.

In some embodiments, the risk level may be predicted by applying themachine learning model to supplemental data. By way of example,processing device 114 may apply a machine learning model to Departmentof Labor statistics and identify a correlation between individualsearning a particular amount of income in a particular geographical areaand a likelihood of loan repayment, which may in turn impact alikelihood of failure of an entity (e.g., a bank). Additionally oralternatively, processing device 114 may receive data from otherentities (e.g., banks) similar to an entity providing the institutiondata.

In some embodiments, based on the analysis data, processing device 114may transmit an alert to a management device (e.g., processing device104) communicably connected to a system (e.g., activity analysisplatform 110). In some embodiments processing device 114 may transmitalerts periodically. Additionally or alternatively, processing device114 may transmit alerts when a transmission criterion is satisfied. Forexample, processing device 114 may transmit an alert when a generatedrisk level exceeds a threshold (e.g., is in a range above “low”). Insome embodiments, an alert transmission threshold may be set by a userat a management device.

Referring again to process 400, at exemplary step 410, processing device114 may provide analysis results, which may have been generated as aresult of step 412. In some embodiments, analysis results may includeany of the risk scores or risk levels described above. In someembodiments, processing device 114 may use the analysis data to generatea graphical user interface, which may include an amount of the analysisdata (e.g., a list of institutions and corresponding risk scores) and/ormodel inputs (e.g., write-offs arranged by recency, loans arranged byloan type, loans arranged by NAICS sector, loans arranged by length ofdelinquency, etc.). Such a graphical user interface may include filtersthat may allow a user to select particular analysis results and/orsurface data (e.g., model inputs) that impacted the analysis results.For example, a user may select a minimum risk score, and processingdevice 114 may provide analysis results for only institutions having arisk score at or above the user-selected minimum. In some embodiments,processing device 114 may filter analysis results to only includeresults for statistical outlier model outputs. Additionally oralternatively, the analysis results may include a graph, such as a linegraph, that may chart a variable over time, such as a total value ofoutstanding loans, a number of loans opened, a number of loans closed, anumber of new locations (e.g., bank branches opened), or any otherinformation related to the model inputs discussed above. Additionally oralternatively, the analysis results may include a map, which may includea number of indicators placed on locations of areas of interest, such aslocations of bank branches at a particular risk of failure. Additionallyor alternatively, analysis results may include aggregated general ledgerdata for a bank or other institution, which may include changes tointerest income, non-interest income, interest expenses, non-interestexpenses, and/or other general ledger categories. In some embodiments,graphs and visualizations may be connected and surfaced depending onuser interaction, allowing ad hoc exploration. For example, a user mayselect a graphical element (e.g., institution identifier) on a firstuser interface (e.g., a list of institutions and corresponding riskscores), which may surface a second user interface with differentinformation, which may be specific to an institution (e.g., a graph ofrisk score changes over time, graphical indicators of data inputsunderlying a risk score, a graphical element that launches acommunication interface with the institution, etc.). As another example,a drill-down user selection on a chart of period-to-period change mayreveal a detailed chart of changes in underlying, more detailed datacategories, such as loan growth in a particular segment or depositoutflows in a particular type of account. In some embodiments, analysisresults may include information from a third-party data source, whichmay be an entity not associated with institutions for whom risk scoresare generated. For example, a processing device 114 may use an API tocrawl data from a source of public corporate or regulatory filings(e.g., for inserting missing structured data for a user interface),latitude-longitude data (e.g., for generating a map of locations ofinterest), and the like. A processing device 114 may also generatemappings between unstructured information (e.g., document dataassociated with loans) and structured information (e.g., an assetdescribed in a general ledger). FIGS. 7A-7D show yet additional examplesof user interfaces that may present analysis results.

In some embodiments, processing device 114 may apply a natural languagegeneration (NLG) process to model output from the machine learning modelto produce at least one phrase, which may be included in the analysisresults. For example, processing device 114 may apply an NLG process toa risk level output at step 412, which may generate a phrase helping auser to understand the analysis results. By way of example, applying anNLG process in this context may generate a phrase such as “risk levelelevated to moderate one week ago” “consider monitoring more closely,”or any of the phrases shown in FIGS. 7A-7D (e.g., “within the liquidityZ score, the most significant negative factor was a decrease in theRetained Earnings/Total Assets ratio”).

Referring again to process 400, at exemplary step 412, processing device114 may update a model. For example, processing device 114 may modify atleast one model parameter based on a model output and/or user input. Byway of example, processing device 114 may modify at least one modelparameter based on a model output predicting that a particular bank willfail and a user input that the bank did not fail, or did not fail withina predicted timeframe. In some embodiments, processing device 114 mayupdate a model based on data and/or user inputs from multiple entities,such as different financial transaction systems 102, which may beassociated with multiple institutions (e.g., banks) across differentgeographies, who may maintain different assets, liabilities, etc.Regularly collecting new data (e.g., model inputs, model outputs) maallow processing device 114 to maintain a more robust model to identifyinstitutional risks before they are realized.

FIG. 5 is a flowchart of example process 500 for analyzing documentdata, consistent with the disclosed embodiments. Process 500 may beperformed by a computer-implemented system (e.g., server 200) infinancial transaction system 102 or activity analysis platform 110, orby an apparatus (e.g., user device 300). The computer-implemented systemmay include a memory (e.g., memory 230 or 330) that stores instructionsand a processor (e.g., processor 210 or 310) programmed to execute theinstructions to implement process 500. Process 500 may be connected togenerating and/or displaying certain user interfaces, such as thoseshown in FIGS. 7A-7D. Process 500 may be implemented as one or moresoftware modules (e.g., an API in document data analyzer 232) stored inmemory 230 and executable by processor 210. For ease of description,some steps of process 500 are described as performed by a particulardevice, such processing device 104. However, it should be noted that anystep may be executed by any device within system architecture 100, suchas processing device 114. Process 500 may incorporate aspects of stepsfrom other processes discussed herein. For example, providing theanalysis results at step 512 may include aspects of providing analysisresults as described with respect to step 410 of process 400.

Referring to process 500 shown in FIG. 5 , at exemplary step 502,processing device 104 may access document data. In some embodiments, thedocument data may be associated with at least one of a transaction(e.g., a loan) or an individual. In some embodiments, the document datamay be associated with a financial institution, such as a bank, whichmay host a financial transaction system 102. In some embodiments,document data may include an image or other digital representation of aphysical document (e.g., a PDF document). In some embodiments, thedocument data may be associated with a particular industry, such asfinancial services. For example, the document data may be associatedwith at least one of a financial asset, a financial liability, netincome during a time period, earnings during a time period, a loan, adeposit, or an expense.

Document data may include structured and/or unstructured data. As anexample of unstructured data, document data may include an image of anindividual's signature or handwritten notes (e.g., notes regarding aloan applicant). As an example of structured data, document data mayinclude metadata associated with a document (e.g., a time the documentwas generated, an individual associated with the document, aninstitution associated with the document, a product associated with thedocument, etc.). Such metadata may have been user-generated (e.g., at auser device 300), or machine-generated.

Referring again to process 500 shown in FIG. 5 , at exemplary step 504,processing device 104 may classify the document data (e.g., normalizeddocument data from step 504). In some embodiments, such as prior toclassifying the document data, processing device 104 may convertunstructured data to structured data. For example, processing device 104may perform optical character recognition techniques to a document toidentify text and create machine-readable text. In some embodiments, amachine learning-based classifier (e.g., a random forest classifier) mayclassify the document data. In some embodiments, processing device 104may use a machine learning classifier to classify the document data. Insome embodiments, classifying the normalized document data may includeidentifying at least one marker in the first document data. A marker maycomprise a word, a phrase, a frequency of text, a position of textrelative to a document, a position of text relative to other text in thedocument, a sentence, a number, a pictographic identifier, or any visualindicator, any of which may be correlated (e.g., using a machinelearning model) with a document type (e.g., a loan application, anaccount opening, a loan closing document, etc.). In some embodiments, amarker may be associated with a document type based on user-createdmappings between a marker or combination of markers and a document type.Such mappings may be maintained at memory 230, database 240, or anyother storage device. Instead of or in addition to mappings, a markermay be associated with a document type based on a target keyword list orexception. Additionally or alternatively, a marker may be associatedwith a document type by a machine learning model, which may learn fromdocument classifications and/or market-document type mappings made byusers over time to generate new associations and/or associationrecommendations. For example, a model (e.g., analysis model 234) may beimproved over time by flagging “false extractions” through user-basedreviews of predictions to improve accuracy for types of documents thatmay be underperforming in an extraction process.

Referring again to process 500 shown in FIG. 5 , at exemplary step 506,processing device 104 may extract text or other features from documentdata (e.g., classified document data), which may be used as model inputdata. For example, processing device 104 may extract text fromclassified (or unclassified) document data. In some embodiments,processing device 104 may select an extraction model (e.g., a modelconfigured to extract text from document data) from among a plurality ofcandidate extraction models based on the classified document data. Forexample, processing device 104 may have access to multiple extractionmodels that have particularized parameters for different types ofdocuments or different entities (e.g., financial institutions), and mayselect an extraction model designated (e.g., in a look-up table) for aparticular document type (e.g., a loan closing document) and/or entity(e.g., bank); which may have been identified through the document dataclassification (e.g., at step 504). In some embodiments, processingdevice 104 may apply a natural language processing (NLP) method toclassified document data to determine particular text. For example, anNLP method may learn particular phrases or keywords in a specificcontext having a higher importance for a document type, or a strongerimpact on a model output. For example, processing device 104 may trainan NLP model as part of a training stage and/or using new document dataas it is received.

Processing device 104 may extract a number of different documentfeatures for generating risk analysis information. For example, incontexts related to financial institutions, extracted document featuresmay include a parameter related to an account value, a transactionvalue, an asset value (e.g., home value), a payment date, a loan term, aloan payment history (e.g., including a principal issuance, a paymentreceived, a late charge, a number of days past due, a call code), anindividual demographic trait (e.g., income amount), an economicstatistic, a credit history, a credit score, a geographical identifier(e.g., zip code, city, state), ledger data (e.g., an income amount, anexpense amount, an asset amount, a liability amount, a call report, aninstitution (e.g., bank) failure list, a capital ratio, a liquidityamount, a deposit amount, or an enforcement action indicator.

Referring again to process 500 shown in FIG. 5 , at exemplary step 508,processing device 104 may normalize the text or other features (e.g.,extracted at step 506) to generate model input data. In someembodiments, normalizing the document data may comprise using regularexpression parsing to extracted text to cleanse the text, which may makeit more suitable as model input data. In some embodiments, processingdevice 104 may place particular text into designated fields. In someembodiments, processing device 104 may perform (e.g., afternormalization) a targeted classification operation to map a field and/ortext to a document type (e.g., for use in a classifier, such asdiscussed with respect to step 504). For example, processing device 104may categorize a field (account or loan type, product type, etc.) usinga model that is trained on with input data from one or more institutions(e.g., banks).

In some embodiments, processing device 104 may determine that a machinelearning model (e.g., a machine learning model implementing process 400)may have insufficient model input data to provide a model output of athreshold confidence. In these embodiments or others, processing device104 may display a warning or otherwise notify a user (e.g., at userdevice 300), For example, processing device 104 may provide a userinterface allowing a user of processing device 104 (e.g., an instance ofuser device 300) to request additional information (e.g., document data,missing structured data information, unknown model inputs, or dataundetermined due to a normalization error, classification error,extraction error, etc.). For example, processing device 104 may providea button within a user interface that, when selected by an input device,will prompt another device (e.g., a device within financial transactionsystem 102) for data, such as by transmitting an alert to the otherdevice. In some embodiments, processing device 104 may prompt anotherdevice to re-capture document data, such as by re scanning (e.g., with adocument scanner, mobile device camera, etc.) a physical document. Anexample of button for prompting additional data is shown by the buttonlabeled “Initate New Records Request” in FIG. 7B. Additionally oralternatively, processing device 104 may (e.g., according to a machinelearning model) may replace values and/or impute missing values usingstatistical (e.g., time series analysis) and/or machine-learningapproaches using context from an institution or group of institutions(e.g., time of year, past trends, current trend, a model function,etc.).

Referring again to process 500 shown in FIG. 5 , at exemplary step 510,processing device 104 may apply a document analysis model to thedocument data (e.g., classified document data). In some embodiments,processing device 104 may select a machine learning model from among aplurality of candidate machine learning models based on the classifieddocument data. For example, processing device 104 may have access tomultiple models that have particularized parameters for different typesof documents or different entities (e.g., financial institutions), andmay select a machine learning model designated (e.g., in a look-uptable) for a particular document type (e.g., a loan closing document)and/or entity (e.g., bank), which may have been identified through thedocument data classification. In some embodiments, applying the documentanalysis model to document data may score the document. For example, themachine learning model may have been trained to generate a favorabilityoutput indicating a favorability (e.g., predicted revenue to begenerated, predicted return on investment, predicted likelihood ofrepayment, predicted number of late payments, etc.) of the transaction(e.g., a loan application) or individual, and the favorability outputmay comprise an amount of risk associated with the transaction orindividual. In some embodiments, the score of the document may relateto, for example, a predicted likelihood that an individual will pay backa loan, a predicted likelihood and/or frequency of late payments, or apredicted level of added risk to an entity (e.g., a bank). In someembodiments, a processing device 114 may implement a state transitionmodel (Markov chain model), such as the state transition model shown inFIG. 8 .

In some embodiments, the machine learning model may be trained togenerate the favorability output using historical data at least a firstfinancial institution associated with the document data or a secondfinancial institution associated with additional document data. Forexample, the machine learning model may have been trained using inputdocuments or other input data only from the entity (e.g., bank) fromwhich the document data (e.g., loan data) is accessed at step 502.Additionally or alternatively, the machine learning model may have beentrained using input documents or other input data from an entity otherthan an entity from which the document data was accessed at step 502.

In some embodiments, processing device 104 may apply document analysismodel, or other model, that is trained to predict a change in modelinput data that will improve the favorability output. For example, amachine learning model may receive some model inputs, such as an age ofa loan applicant, but may lack other model inputs, such as an amount ofa loan previously paid off by the application. The machine learningmodel may predict that receiving certain additional model inputs (e.g.,that the loan applicant paid back a $10,000 loan in the past two years)will lead to a change in the favorability (e.g., a prediction of risk toa bank presented by a loan applicant). In some embodiments, a machinelearning model may predict actions that may improve a return oninvestment (ROI). For example, a machine learning model may learnthrough an iterative feedback loop of model inputs (e.g., comprisingloan application document data, loan payment document data, etc.) thatparticular combinations of individual traits (e.g., income amount,geographical area, etc.), transaction parameters (e.g., loan amount,loan term, etc.), and like may be correlated with greater ROI, and mayprovide corresponding recommendations to a processing device (e.g.,processing device 104), based on changes in model inputs predicted toyield a better model output (e.g., a higher ROI).

Referring to process 500 shown in FIG. 5 , at exemplary step 512,processing device 104 may provide the analysis results. In someembodiments, processing device 104 may generate analysis data based onscored document data. In some embodiments, providing the analysisresults may be based on an alert threshold (e.g., as discussed abovewith respect to process 400). For example, processing device 104 maydetermine whether the favorability output satisfies an alert criterion.If the favorability output satisfies the alert criterion, processingdevice 104 generate an alert at a display or other output device (e.g.,user interface 340). In some embodiments, an analysis resultvisualization may be connected to another visualization, which may besurfaced depending on user interaction, allowing ad hoc exploration. Forexample, a user may select a graphical element (e.g., a loan category)on a first user interface, which may surface a second user interfacewith different information (e.g., a list of loans in the loan categoryhaving risk levels beyond a threshold). It is appreciated that analysisresults and user interfaces of step 512 may include aspects discussedabove with respect to process 400. For example, processing device 104may provide a map (e.g., a map of bank branches with riskier loanportfolios) as part of the analysis results.

Referring to process 500 shown in FIG. 5 , at exemplary step 514,processing device 104 may update a model. For example, processing device104 may modify at least one model parameter based on a model outputand/or user input. By way of example and not limitation, processingdevice 114 may modify at least one model parameter based on a modeloutput predicting that a particular individual will miss a loan paymentin the next six months and a user input that the individual made allscheduled payments for six months. In some embodiments, processingdevice 104 may update a model based on data and/or user inputs frommultiple entities, such as different financial transaction systems 102,which may be associated with a same institution (e.g., bank) distributedacross different geographies (e.g., different bank branches), who maymaintain different assets, liabilities, etc. Regularly collecting newdata (e.g., mod el inputs, model outputs) may allow processing device104 to maintain a more robust model to identify a risk presented by atransaction or individual.

FIG. 6 is a flowchart of example process 600 for coordinating analysisdata delivery access, consistent with the disclosed embodiments. Process600 may be performed by a computer-implemented system (e.g., server 200)in financial transaction system 102 or activity analysis platform 110,or by an apparatus (e.g., user device 300). The computer-implementedsystem may include a memory (e.g., memory 230 or 330) that storesinstructions and a processor (e.g., processor 210 or 310) programmed toexecute the instructions to implement process 600. Process 600 may beimplemented as one or more software modules (e.g., an API in documentdata analyzer 232) stored in memory 230 and executable by processor 210.For ease of description, some steps of process 600 are described asperformed by a particular device, such as processing device 104 or 114.However, it should be noted that any step may be executed by any devicewithin system architecture 100, such as processing device 114. Whileprocess 600 is described with respect to APIs, it should be noted thatwebsite uploads, a file transfer protocol (FTP) process usinginter-system messages, or another other form of suitable electroniccommunications may be used.

Referring to process 600 shown in FIG. 6 , at step 602, processingdevice 104 may receive an API request. In some embodiments, the APIrequest may be sent from a requestor device (e.g., processing device104), and may be received through an API. The API request may be an APIrequest for data and may identify a requestor entity (e.g., a bank)associated with the requestor device. By requesting data using an API, arequestor device may eliminate a need to have a particular programstored locally (e.g., a particular module), which may need frequentupdates, or which may pull data at a faster rate than desired, thusunnecessarily burdening bandwidth. Moreover, as further explained below,an API request may be a request for specific datasets, which reduce thesize of datasets that may otherwise be automatically sent to a requestordevice. In some embodiments, an API request may include unstructureddata (e.g. data from a scanned document), semi-structured data, orstructured data.

Referring again to process 600 shown in FIG. 6 , at step 604, processingdevice 114 may determine a data type based on an API request (e.g.,received at step 602). In some embodiments, processing device 114 maydetermine the data type based on at least one data type parameter in theAPI request. A parameter in the API request may identify at least oneof: a timeframe, a geographical area, a financial institution, an assetvalue, an asset value change, a liability value, a liability valuechange, a loan, a deposit, an expense, or a risk level threshold. In oneembodiment, the API request may be a request for normalized data as aservice, which may involve a request to APIs that provide processes andservices for generating normalized and high-quality data originatingfrom banking cores and document repository in a format for furtheranalysis or modeling in a client application or platform (e.g.,modeling, visualization, reporting of normalized, granular data, etc.).For example, an API request may have one or ignore fields, or other datastructures, that indicate a particular dataset configuration (e.g., oneor more data types) requested. Continuing this example, an API requestmay indicate a request for an anonymized aggregated dataset of thechanges to total assets in liabilities for banks over the past year. Inanother embodiments, the API request may be a request for risk data as aservice, which may involve a request to APIs providing model output,risk scoring output, a list of high-risk accounts and/or loans, and thelike, a well as various aggregations of this data such as by geography,institution, peer group, or loan category.

Referring again to process 600 shown in FIG. 6 , at step 606, processingdevice 104 may determine an authorization level of a requestor (e.g., adevice from which the API request was received at step 602). In someembodiments, processing device 104 may only allow requestor devices toaccess certain datasets, depending on the authorization level of therequestor. For example, processing device 104 may maintain (e.g., indatabase 240) a group of mappings between various authorization levelsand data types. By way of example, a “general statistics” authorizationlevel may be mapped to data types such as an average change in new loanofferings over time, but may not be mapped to data types such asgeographic filters.

Referring again to process 600 shown in FIG. 6 , at step 608, processingdevice 114 may access corresponding model output data, with maycorrespond to the data type and authorization level determined at steps604 and 606. For example, processing device 114 may retrieve data from adata storage device (e.g., database 240), or may generate data (e.g.,model output data) on demand. In some embodiments, processing device 114may determine that the authorization level of the requestor device doesnot map to a data type in the API request, and may deny access of therequestor device to that data type. Processing device 114 may also denyaccess where no authorization level is denoted in the API request.

In some embodiments, the model output data may have been generated by amachine learning model (e.g., implemented by processing device 114)trained to predict a risk level based on document data. For example, themodel output data may comprise analysis results; discussed above withrespect to processes 400 and 500. In some embodiments, the document datamay be extracted from one or more documents according to a naturallanguage processing (NLP) technique, such as those discussed above withrespect to processes 400 and 500. In some embodiments, the model outputdata may include at least one metric associated with an entity providingthe document data. For example, the model output data may include apredicted risk score or risk level, a predicted trend for aninstitutional metric (assets, liabilities, loans opened, loans closed,financial products sold, etc.), a recommendation for changing aninstitutional metric based on a predicted model output, or any otherdata described herein.

In some embodiments, a processing device 114 responding to an APIrequest may apply a machine learning model to predict a change in atleast one metric (e.g., institutional metric) based on first and secondmodel output data. For example, a change in at least one metric may bebased on first model output data generated by a machine learning modelconfigured to analyze loan applications and second model output datagenerated by a machine learning model configured to analyze new savingsaccount openings. In some embodiments, processing device 114 may apply amachine learning model that is trained to predict a plurality of risklevels based on the document data (e.g., document data extracted fromloan applications, payment confirmations, account opening papers, etc.).In some embodiments, the document data may be from different financialinstitutions (e.g., banks). Additionally or alternatively, a machinelearning model (e.g., a source of the model output data accessed) may befurther trained to predict a risk level based on demographic or economicdata, as discussed above with respect to process 400.

In some embodiments, processing device 114 may determine a formatassociated with a requestor device and/or requestor entity. For example,the requestor device (e.g., processing device 104) may host an API notimplemented by processing device 114, which may have particularformatting criteria for received data, such that it can be useable bythe requestor device API. For example, processing device 114 may changea data sequence, configure data into a particular structure (e.g.,table, linked-list, array, stack, queue, tree, graph, etc.), add headerinformation to a data stream, apply a signature operation to data (e.g.,hash function), or take another other action to generate a data streamand/or data batch that is usable by a requestor device (e.g. an API ofthe requestor device). In this manner, disparate systems may be madecompatible for effective information exchange.

In some embodiments, processing device 114 may determineentity-identifying information in the model output data, such asindividual names, addresses, Social Security numbers, etc. In someembodiments, entity-identifying information may be associated withindividuals who are customers of different financial institutions, butthe received API request may be from a single financial institutionrequesting data generated based on information received from multiplefinancial institutions. In these or other situations, processing device114 may anonymize model output data prior to transmitting the modeloutput to the requestor device (e.g., at step 610). In this manner, asingle financial institution may be able to access predictive datagenerated by machine learning model using de-anonymized model input datafrom multiple financial institutions, without disclosing anyde-anonymized individual or financial institution-specific data.

Referring again to process 600 shown in FIG. 6 , at step 610, processingdevice 114 may transmit corresponding data to the requestor. In someembodiments, processing device 114 may transmit the corresponding datato the same requestor device from which the API request was received,but, additionally or alternatively, may transmit the corresponding datato another device, such as a device associated with a same entity as therequestor device (e.g., another device hosted by a same financialinstitution as the requestor device). In some embodiments, processingdevice 114 may transmit a predicted change in at least one metric to arequestor device. In some embodiments, prior to transmitting the firstmodel output, processing device 114 may reformat model output data tosatisfy a format associated with the requestor device (as discussedabove with respect to step 608).

FIGS. 7A-7D depict example interfaces 700A, 700B, 700C, and 700D, any orall of which may be presented on user device 300, consistent with thedisclosed embodiments. For example, user device 300 may be a smartphoneassociated with a user, and any of interfaces 700A-700D may be displayedon user interface 340 (e.g., a display panel or a touchscreen). Any orall of these user interfaces may include data included in process 400and/or 500 (e.g., a risk level, a model input value, etc.).

Example interface 700A depicts a ranked list view, which may display anumber of institutions (e.g., financial institutions such as banks) andassociated information, such as analysis results generated by a machinelearning model. For example, interface 700A may rank institutions by anamount of predicted risk, and may include amounts of change in risk overa particular period of time (e.g., three months). Interface 700A mayinclude other information related to a predicted risk or an institution,such as a z-score, a percentile ranking, an institutional metric (e.g.,variance in risk score, total amount of new loans issued, etc.), In someembodiments, interface 700A may include filters, drop-down menus, orother interactable user interface elements, which may allow a user todetermine particular criteria for accessing and/or generating certainanalysis results. In some embodiments, a processing device (e.g., 104 or114) may provide any or all of the information displayed in interface700A (e.g., as part of process 400, 500, or 600). For example,processing device 114 may display model output information in interface700A at step 414.

Example interface 700B depicts an institution detail view, which maydisplay information associated with a particular institution (e.g., abank), some or all of which may have been generated by a machinelearning model. For example, interface 700B may include an aggregaterisk score, credit risk score, earnings risk score, liquidity Z riskscore, or any other metric associated with institutional risk, any ofwhich may be associated with a particular bank. In some embodiments,interface 7008 may also include graphs showing a change in risk level(e.g., as determined by a machine learning model according to process400) over a certain period of time. In some embodiments, interface 700Bmay also present information in the form of words or graphics thatcompares particular metrics of one institution to another institution,or to a group of similar institutions (e.g., based on amount of assets,location, etc.). Additionally or alternatively, interface 7008 mayinclude text produced through NLG, as described above. In someembodiments, a processing device (e.g., 104 or 114) may provide any orall of the information displayed in interface 7008 (e.g., as part ofprocess 400, 500, or 600). For example, processing device 114 maydisplay model output information in interface 7008 at step 414.

Example interface 700C depicts an institution dashboard view, which mayalso display information associated with a particular institution (e.g.,a bank), some or all of which may have been generated by a machinelearning model. For example, interface 700C may display an overallportfolio risk generated by a machine learning model using model inputssuch as amounts and timings of charge-offs, delinquent loan information,loan amounts, types of loans, and the like. Interface 700C may include asearch bar that allows a user to search for particular document data(e.g., data extracted from a loan application) associated with aninstitution (e.g., a bank). In some embodiments interface 700C maydisplay search result information or a user interface element that, whenselected, displays search result information, such as particularfinancial transactions, institutions, or risk-related information. Insome embodiments, interface 700C may display input data to a model, suchas a scanned document, structured data associated with a document,and/or requested document data. In some embodiments, a processing device(e.g., 104 or 114) may provide any or all of the information displayedin interface 700C (e.g., as part of process 400, 500, or 600), Forexample, processing device 114 may display model output information ininterface 700C at step 414.

Example interface 700D depicts a search result view, which may displaydocument information associated with one or more institutions. In someembodiments, interface 700D may be displayed in response to a useraction taken at another user interface (e.g., a search entered atinterface 700C). For example, a user may enter search parameters relatedto loan information at interface 700C and interface 700D may begenerated in response. As seen in FIG. 7D, interface 700D may displayinformation associated with a document or group of documents, such asloans, including a product type, a call code, a name, any of the othercolumn descriptors in FIG. 7D, or any other information describing atrait of a document, which may have been determined according to acombination of OCR, NLP, and machine learning techniques (e.g.,according to process 400 or 500, described above). In some embodiments,interface 700D may include one or more buttons or other interactableuser interface elements that may provide certain functionality. Forexample, user interface 700D may include a button that, when selected,generates a virtual binder or adds a data element (e.g., a data elementassociated with a loan) to a virtual binder. In some embodiments, aprocessing device (e.g., 104 or 114) may provide any or all of theinformation displayed in interface 700D (e.g., as part of process 400,500, or 600). For example, processing device 114 may display modeloutput information in interface 700D at step 414.

FIG. 8 depicts an example diagram of a borrower state transition model800, consistent with the disclosed embodiments. Borrower statetransition model 800 may statistically model (e.g., according to aMarkov chain) the likelihood that a borrower will transition betweendifferent borrowing states. In some embodiments, transitionprobabilities (t_(0,1), t_(n,n), etc.) may be based on predictions maybe based on data extracted from documents (e.g., according to process400). In some embodiments, borrower state transition model 800 may beimplemented through a module, program, application, or other computercode. For example, processing device 114 may execute a module thatimplements borrower state transition model 800, to predict whether aparticular individual or group of individuals may default on a loan. Insome embodiments, processing device 114 may implement a modulecorresponding to borrower state transition model 800 as part of process400, or any other process described herein. Of course, other stochasticmodels, or other models altogether, may be used.

A non-transitory computer-readable medium may be provided that storesinstructions for a processor (e.g., processor 210 or 310) for processinga financial transaction according to the example flowcharts of FIGS. 4-6above, consistent with embodiments in the present disclosure. Forexample, the instructions stored in the non-transitory computer-readablemedium may be executed by the processor for performing processes 400,500, or 600 in part or in entirety. Common forms of non-transitory mediainclude, for example, a floppy disk, a flexible disk, hard disk,solid-state drive, magnetic tape, or any other magnetic data storagemedium, a Compact Disc Read-Only Memory (CD-ROM), any other optical datastorage medium, any physical medium with patterns of hoes, a RandomAccess Memory (RAM), a Programmable Read-Only Memory (PROM), andErasable Programmable Read-Only Memory (EPROM), a FLASH-EPROM or anyother flash memory, Non-Volatile Random Access Memory (NVRAM), a cache,a register, any other memory chip or cartridge, and networked versionsof the same.

While the present disclosure has been shown and described with referenceto particular embodiments thereof, it will be understood that thepresent disclosure can be practiced, without modification, in otherenvironments. The foregoing description has been presented for purposesof illustration. It is not exhaustive and is not limited to the preciseforms or embodiments disclosed. Modifications and adaptations will beapparent to those skilled in the art from consideration of thespecification and practice of the disclosed embodiments.

Computer programs based on the written description and disclosed methodsare within the skill of an experienced developer. Various programs orprogram modules can be created using any of the techniques known to oneskilled in the art or can be designed in connection with existingsoftware. For example, program sections or program modules can bedesigned in or by means of .Net Framework, .Net Compact Framework (andrelated languages, such as Visual Basic, C, etc.), Java, C++,Objective-C, Hypertext Markup Language (HTML), HTML/AJAX combinations,XML, or HTML with included Java applets.

Moreover, while illustrative embodiments have been described herein, thescope of any and all embodiments having equivalent elements,modifications, omissions, combinations (e.g., of aspects across variousembodiments), adaptations and/or alterations as would be appreciated bythose skilled in the art based on the present disclosure. Thelimitations in the claims are to be interpreted broadly based on thelanguage employed in the claims and not limited to examples described inthe present specification or during the prosecution of the application.The examples are to be construed as non-exclusive. Furthermore, thesteps of the disclosed methods, or portions of the steps of thedisclosed methods, may be modified in any manner, including byreordering steps, inserting steps, repeating steps, and/or deletingsteps (including between steps of different exemplary methods). It isintended, therefore, that the specification and examples be consideredas illustrative only, with a true scope and spirit being indicated bythe following claims and their full scope of equivalents.

1-20. (canceled)
 21. A system for risk prediction, the systemcomprising: at least one processor; and a non-transitorycomputer-readable medium containing instructions that, when executed bythe at least one processor, cause the at least one processor to performoperations comprising: classifying document data by identifying at leastone marker in the document data, the at least one marker beingassociated with a document type; selecting an extraction model based onthe document type; extracting model input data from the classifieddocument data using the extraction model; applying a machine learningmodel to the extracted model input data to score the document data, themachine learning model having been trained with document data of a samedocument type as the document type associated with the at least onemarker; and generating, based on the application of the machine learningmodel to the extract model input data, a favorability output based on anamount of risk associated with the document data.
 22. The system ofclaim 21, wherein classifying the document data comprises performing anoptical character recognition (OCR) technique to a document to createmachine-readable text.
 23. The system of claim 21, wherein classifyingthe document data includes using a random forest classifier.
 24. Thesystem of claim 21, wherein the at least one marker comprises one ormore of a word, a phrase, a frequency of text, a position of textrelative to a document, a position of text relative to other text in adocument, a sentence, a number, or a pictographic identifier.
 25. Thesystem of claim 21, wherein the at least one marker is associated withdocument type by a machine learning model trained with user-createdmappings between markers and document types.
 26. The system of claim 21,the operations further comprising determining that the machine learningmodel has insufficient model input data to provide a model output of athreshold confidence.
 27. The system of claim 26, the operations furthercomprising notifying a user to provide additional model input data. 28.The system of claim 26, the operations further comprising prompting adevice to re-capture document data.
 29. The system of claim 21, whereinthe favorability output comprises an amount of risk associated with atransaction or individual associated with the document data.
 30. Thesystem of claim 21, wherein the machine learning model is trained usinginput data from an entity other than an entity from which the documentdata was accessed.
 31. The system of claim 21, the operations furthercomprising applying a prediction model to predict a change in modelinput data that will improve the favorability output.
 32. The system ofclaim 31, the operations further comprising training the predictionmodel to learn, through an iterative feedback loop of model inputs,combinations of individual traits and transaction parameters correlatedwith improved favorability outputs.
 33. The system of claim 21, theoperations further comprising modifying at least one model parameterbased on the favorability output.
 34. The system of claim 21, theoperations further comprising modifying at least one model parameterbased on a user input.
 35. The system of claim 21, the operationsfurther comprising updating the machine learning model based on inputsreceived from multiple systems.
 36. A method for risk prediction,comprising: classifying document data by identifying at least one markerin the document data, the at least one marker being associated with adocument type; selecting an extraction model based on the document type;extracting model input data from the classified document data using theextraction model; applying a machine learning model to the extractedmodel input data to score the document data, the machine learning modelhaving been trained with document data of a same document type as thedocument type associated with the at least one marker; and generating,based on the application of the machine learning model to the extractmodel input data, a favorability output based on an amount of riskassociated with the document data.
 37. The method of claim 36, whereinclassifying the document data comprises performing an optical characterrecognition (OCR) technique to a document to create machine-readabletext.
 38. The method of claim 36, wherein the document data isclassified by a random forest classifier.
 39. The method of claim 36,wherein the at least one marker comprises one or more of a word, aphrase, a frequency of text, a position of text relative to a document,a position of text relative to other text in a document, a sentence, anumber, or a pictographic identifier.
 40. The method of claim 36,wherein the machine learning model is trained using input data from anentity other than an entity from which the document data was accessed.